Metadata-Based Parallelization of Program Instrumentation

نویسندگان

  • Matthew D. Allen
  • Gurindar S. Sohi
چکیده

Program instrumentation has a wide variety of useful applications, but tool writers must overcome the challenge of substantial overheads caused by introducing additional code and data into a program. This paper observes that instrumentation usually operates on many discrete, independent data structures, which we callmetadata parallelism. We proposes to exploit this phenomenon to reduce the overhead of instrumented programs by executing instrumentation function invocations that manipulate different pieces of metadata simultaneously in different threads. The key challenge to spreading instrumentation function execution across many threads is ensuring that metadata updates occur in the correct order, and do not suffer from data races. Metadata-based parallelization solves this problem by using a user-specified mapping of instrumentation function invocations to serialization sets. The runtime ensures that metadata updates are handled correctly by executing all function invocations in a given serialization set in the same thread. It achieves concurrency by spreading different serialization sets across multiple threads. Metadata-based parallelization improves on previous techniques to reduce the overhead of program instrumentation of a broad class of dynamic monitoring tools, including those that measure commoncase behavior, such as profilers, and those that check for anomalous behavior, such as debugging and testing tools. Our technique allows tool developers to leverage parallelism with a natural, intuitive programming interface, leaving the burden of correct synchronization of the parallelized execution to the instrumentation system. We have modified the EEL instrumentation system to support metadata-based parallelization, and we evaluate our prototype by comparing the performance of parallelized instrumentation on both multicore and SMP systems. We show that the fast communication provided by the multicore system is a key enabler for fine-grained parallelization, achieving speedups averaging 4.3X for value profiling and 2.9X for data dependence profiling using 8 additional thread contexts.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallelization of Rich Models for Steganalysis of Digital Images using a CUDA-based Approach

There are several different methods to make an efficient strategy for steganalysis of digital images. A very powerful method in this area is rich model consisting of a large number of diverse sub-models in both spatial and transform domain that should be utilized. However, the extraction of a various types of features from an image is so time consuming in some steps, especially for training pha...

متن کامل

Automatic Parallelization - New Approaches to Code Generation, Data Distribution, and Performance Prediction

This paper introduces the Weight Finder, an advanced profiler for Fortran programs,which is based on a von Neumann architecture. Existing Fortran codes are generally too large toanalyze fully in depth with respect to performance tuning. It is the responsibility of the WeightFinder to detect the most important regions of code in the program, as far as execution time isconcerned. ...

متن کامل

Metadata Enrichment for Automatic Data Entry Based on Relational Data Models

The idea of automatic generation of data entry forms based on data relational models is a common and known idea that has been discussed day by day more than before according to the popularity of agile methods in software development accompanying development of programming tools. One of the requirements of the automation methods, whether in commercial products or the relevant research projects, ...

متن کامل

Contech: Parallel Program Representation and High Performance Instrumentation

This summary of my dissertation work explores a pair of problems: how can a parallel program’s execution be comprehensively represented? How would this representation be efficiently generated from the program’s execution? I demonstrated that the behavior and structure of a sharedmemory parallel program can be characterized by a task graph that encodes the instructions, memory accesses, and depe...

متن کامل

Support for Debugging Automatically Parallelized Programs

We describe a system that simplifies the process of debugging programs produced by computer-aided parallelization tools. The system uses relative debugging techniques to compare serial and parallel executions in order to show where the computations begin to differ. If the original serial code is correct, errors due to parallelization will be isolated by the comparison. One of the primary goals ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007